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1 .  Introduction 


This  Quarterly  Technical  Report,  Number  17,  describes 
aspects  of  our  work  performed  under  Contract  No.  MDA903-78-C-0356 
during  the  period  from  1  August  1982  to  31  October  1982.  This  is 
the  seventeenth  in  a  series  of  Quarterly  Technical  Reports  on  the 
design  of  a  packet  speech  concentrator,  the  Voice  Funnel. 

j^X)ne  of  the  major  activities  during  this  quarter  has  been  the 
production  of  hardware  for  four  ten-processor  Voice  Funnel 
systems.  In  this  Quarterly  Technical  Report,  we  give  a  physical 
and  block-diagram-level  description  of  the  three  major  components 
of  the  machine:  the  Processor  Node,  the  Switch  Node,  and  the  I/O 


Board. 
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2.  Processor  Node 

The  Processor  Node  is  the  primary  computational  component  of 
the  Butterfly  Multiprocessor.  It  is  also  the  only  source  of 
memory  for  the  system.  Figure  1  is  a  photograph  of  a  Processor 
Node  printed  circuit  board.  Figure  2  is  a  simple  block  diagram 
of  the  Processor  Node. 

The  processor  of  the  Processor  Node  is  an  8  MHz  Motorola 
MC68000.  Virtual  addresses  generated  by  this  processor  are  24 
bits  long,  consisting  of  an  8-bit  segment  number  and  a  16-bit 
offset  within  the  segment.  All  of  the  memory  references 
generated  by  the  68000  are  virtual  addresses.  These  virtual 
addresses  are  translated  into  physical  addresses  by  the  Memory 
Management  Unit. 

The  Memory  Management  Unit  is  a  custom  implementation  for 
the  Butterfly,  built  out  of  MSI  components.  It  supplies  512 
Segment  Attribute  Registers  which  provide  memory  relocation  and 
protection  on  a  segment-by-segment  basis.  Because  there  are  512 
of  these  Registers,  it  is  possible  to  have  the  memory  management 
information  for  many  processes  in  the  Memory  Management  Unit  at 
the  same  time.  This  makes  process  switching  very  rapid. 

On  the  other  side  of  the  Memory  Management  Unit,  the 
Processor  Node  deals  in  physical  addresses.  A  physical  address 


Processor  Node 
Figure  1 
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Block  Diagram  of  Butterfly 
Figure  2 
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specifies  a  Processor  Node  number,  subspace,  and  location  within 
the  subspace.  The  possible  subspaces  are  Local  memory,  remote 
memory,  I/O,  and  a  set  of  special  registers  for  communicating 
with  the  Processor  Node  Controller. 

The  Processor  Node  contains  an  8K  byte  EPROM  for  power-on 
diagnostics  and  a  low-level  debugger. 

All  I/O  devices  in  a  Butterfly  system  are  attached  vi  che 
I/O  bus  on  some  Processor  Node;  it  is  not  possible  for  '0 
device  to  attach  directly  to  a  switch  port.  Up  to  4  I/O  boards 
may  be  attached  to  a  given  Processor  Node.  In  the  Voice  Funnel, 
the  majority  of  Processor  Nodes  have  no  I/O  devices  attached  to 
them. 


All  of  the  memory  in  a  Butterfly  Multiprocessor  is  located 
on  Processor  Nodes  in  the  system.  There  are  no  bulk  or  common 
memory  subsystems.  However,  each  Processor  Node  can  access  the 
memory  on  other  nodes  (subject  to  memory  protection  in  effect  at 
the  time).  Hence,  all  of  the  memory  in  the  machine  is  common 
memory.  The  current  version  of  the  Processor  Node  has  256K  bytes 
of  semiconductor  memory  on  board.  Additional  memory  can  be  added 
in  the  form  of  memory  daughter  boards  (using  the  memory  expansion 
connector)  up  to  a  total  of  4M  bytes.  The  memory  on  the 
Processor  Node  board  does  not  support  battery  backup  but  memory 
on  daughter  boards  does. 
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The  Switch  Interface  supports  the  transfer  of  requests  and 
replies  to  and  from  the  switch.  It  is  composed  of  two  separate 
finite  state  machines:  one  for  output  to  the  switch  and  one  for 
input  from  the  switch.  These  two  data  directions  are  also 
supported  by  independent  connectors  on  the  Processor  Node  board. 
The  switch  interface  interacts  with  the  rest  of  the  Processor 
Node  through  a  pair  of  dual  ported  memories.  When  a  message  is 
to  be  sent  out  across  the  switch,  a  parameter  block  is  set  up  in 
the  appropriate  dual  ported  memory,  and  the  output  finite  state 
machine  is  notified.  When  a  message  comes  in  from  the  switch, 
the  input  finite  state  machine  deposits  it  in  a  dual  ported 
memory  and  notifies  the  Processor  Node  Controller. 

On  the  left-hand  side  of  Figure  2  is  the  Processor  Node 
Controller  (PNC).  This  is  a  2S01-based  microcoded  coprocessor 
for  the  68000.  It  is  16  bits  wide  and  executes  8  million  64-bit 
wide  microinstructions  per  second  from  a  IK  word  read-only 
microstore. 

The  Processor  Node  Controller  has  several  functions.  First, 
it  operates  the  various  control  wires  in  the  Processor  Node  that 
transfer  data  between  components  of  the  node  and  perform  special 
functions.  In  this  role,  the  PNC  is  involved  in  every  memory 
reference  made  by  the  68000.  It  controls  the  flow  of  the  address 
through  the  Memory  Management  Unit,  watches  for  reference  errors, 
and  operates  the  memory  system. 
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The  second  function  of  the  PNC  is  to  operate  the  switch. 
Thus,  the  PNC  is  responsible  for  all  interactions  with  the  finite 
state  machines  in  the  Switch  Interface.  These  interactions  take 
many  forms.  The  simplest  occurs  when  the  68000  makes  a  reference 
to  a  word  of  memory  on  another  node.  In  this  case,  the  PNC  notes 
that  the  reference  is  to  remote  memory,  places  the  remote 
Processor  Node  number  and  memory  address  in  a  parameter  block, 
and  tells  the  output  finite  state  machine  to  start  the 
transaction.  While  the  message  is  en  route,  the  PNC  may  service 
I/O  interrupts,  or  other  microinterrupts,  but  the  68000  is  held 
up.  When  the  message  reaches  the  destination  Processor  Node,  the 
remote  PNC  makes  the  memory  reference  and  sends  backs  a  reply 
message. 

At  the  originating  node,  the  reply  finally  returns  and  the 
value  of  the  desired  memory  location  is  given  to  the  68000  as 
though  it  had  been  a  local  reference.  Because  the  hardware  is 
heavily  overlapped,  this  entire  remote  memory  reference  occurs  in 
less  than  4  microseconds. 

In  addition  to  single  word  transfers,  the  PNC  can  be 
instructed  to  transfer  blocks  of  memory  between  any  two  nodes  in 
the  machine.  These  transfers  can  happen  at  high  speed,  being 
limited  only  by  the  bandwidth  of  the  switch  (32  MHz)  and  the 
memory  of  the  Processor  No’de  (about  42  MHz). 


-  7  - 
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Another  function  of  the  PNC  *.>  to  augment  the  functionality 
of  the  68000  for  the  multiprocessor  environment.  This  is  done  by 
programming  the  PNC  to  perform  a  variety  of  indivisible  primitive 
operations.  For  example,  the  PNC  is  able  to  post  events  without 
involving  any  system  locks.  The  68000  tells  the  PNC  to  post  an 
event  by  writing  the  address  of  a  parameter  block  into  a  special 
memory  location  that  traps  to  the  microcode.  This  causes  the  PNC 
to  send  a  special  message  to  the  destination  node  that  specifies 
the  location  of  the  data  structure  associated  with  the  event.  At 
the  destination  node,  the  PNC  stops  all  other  memory  references 
and  updates  the  event  data  structure.  If  the  state  of  the  posted 
process  is  such  that  it  should  run  immediately,  the  destination 
PNC  also  invokes  the  process  scheduler  on  that  node. 

Other  special  functions  implemented  by  the  PNC  include 
real-time  clocks,  indivisible  add-to-memory  instructions,  and 
dual-queue  functions. 

Just  recently,  we  have  coded  the  process  scheduler  in 
microcode  to  improve  the  performance  of  the  machine  during 
context  swaps.  This  will  be  reported  in  the  next  Quarterly 
Technical  Report. 

Finally,  as  the  photograph  shows,  the  Processor  Node 
contains  an  on-board  switching  power  supply.  This  supply  is 
controlled  by  a  switch  located  just  below  the  power  connector. 
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If  the  power  supply  on  the  Processor  Node  and  any  attached  I/O 
and  memory  boards  all  agree  that  they  are  supplying  the  correct 
voltages,  then  a  green  light  will  come  on.  The  red  light  next  to 
it  is  controlled  by  the  software.  It  comes  on  at  power  up  or  on 
an  error  and  is  turned  off  when  the  Processor  Node  has  passed  a 
firmware  diagnostic. 


I 
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3.  Switch  Node 

The  Processor  Nodes  in  a  Butterfly  Mul ti processor  are 
interconnected  by  an  array  of  Switch  Nodes  as  shown  in  Figure  3. 
Each  of  the  Switch  Nodes  is  a  single  printed  circuit  board  as 
shown  in  Figure  4. 

A  Switch  Node  is  a  four-input  four-output  crossbar  switch. 
In  Figure  3,  each  input  to  a  Switch  Node  on  the  left-hand  side  of 
the  switch  is  connected  to  the  output  of  a  Processor  Node. 
Similarly,  each  output  from  a  Switch  Node  on  the  right-hand  side 
of  the  switch  is  connected  to  the  input  of  a  Processor  Node. 
Data  flows  through  the  switch  from  left  to  right. 

As  shown  in  in  the  photograph,  there  are  eight  26-pin 
connectors  arranged  in  two  columns.  As  a  result,  the  Switch  Node 
is  a  dual  width  card.  The  top  four  connectors  (towards  the  left 
in  the  picture)  are  inputs  to  the  Switch  Node.  The  bottom  four 
are  outputs  from  the  switch.  It  is  interesting  to  note  that  the 
inputs  to  a  Switch  Node  are  identical  in  function.  Thus  when 
wiring  a  machine,  there  is  no  need  to  distinguish  between  input 
ports.  The  output  ports  are  unique,  however,  in  that  the  address 
of  a  packet  determines  which  output  port  is  used. 

In  addition  to  providing  data  routing  through  the  switch, 
the  Switch  Node  also  distributes  clock  and  system-wide  reset 
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signals  to  the  Processor  Nodes.  The  Switch  Node  uses  the  clock 
to  regenerate  signals  passing  through  the  switch.  In  the 
process,  it  delays  the  signal  by  one  full  clock  pulse.  The  clock 
and  reset  signals  are  sent  to  all  of  the  Switch  Nodes  by  a 
network  of  clock  cards. 

The  Switch  Node  has  three  major  sections:  the  power  supply, 
the  logic  that  implements  the  switch  functions,  and  a  set  of  ECL 
drivers  and  receivers.  The  switch  logic  implements  the  routing, 
timing,  and  collision  resolution  processing  needed  to  route 
packets  reliably  through  a  4-by-4  crossbar.  The  power  supply  is 
similar  to  those  used  on  other  Butterfly  boards. 

The  purpose  of  the  ECL  drivers  and  receivers  is  to  give  the 
Butterfly  a  certain  degree  of  immunity  from  ground  reference 
problems.  In  large  machines,  there  may  be  significant  distance 
between  Switch  Nodes  or  between  Switch  Nodes  and  Processor  Nodes. 
In  order  to  operate  reliably  at  high  speeds,  all  of  the  signals 
on  switch  cables  are  driven  by  differential  ECL  drivers  and 
receivers. 

The  switch  logic  is  currently  implemented  with  TTL  and 
Schottkey  TTL  MSI  circuitry..  We  have  made  a  prototype  version  of 
this  circuitry  in  N-MOS  VLSI  using  the  DARPA  MOSIS  IC  fabrication 
facility.  Under  the  next  contract,  we  will  be  developing  the 
VLSI  implementation  for  use  in  future  Switch  Nodes. 


-  13  - 
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4.  I/O  Board 

The  I/O  board  that  we  have  developed  for  the  Voice  Funnel 
application  supports  a  set  of  four  character-asynchronous  I/O 
channels,  and  four  high-speed  synchronous  I/O  channels.  The 
asynchronous  channels  operate  at  speeds  up  to  38.4  kilobits  per 
second,  and  service  the  low-speed  I/O  requirements  of  the  Voice 
Funnel,  such  as  terminals  and  low-speed  load  devices.  The 
synchronous  channels  operate  at  speeds  up  to  2  megabits  per 
second  and  are  used  to  service  high-bandwidth  devices  such  as  the 
Lexnet  Concentrator  Interface  and  the  PSAT.  Figure  5  is  a 
photograph  of  a  printed  circuit  board.  Figure  6  shows  a  simple 
block  diagram  of  the  Butterfly  I/O  board. 


To  meet  the  bandwidth  requirements  of  the  Voice  Funnel 
application,  it  was  necessary  to  implement  a  fairly  sophisticated 
DMA  mechanism  for  transferring  data  between  the  synchronous  I/O 
channels  and  the  main  memory  of  the  Processor  Node.  To  support 
the  required  operations,  the  I/O  board  uses  a  custom  bit-slice 
microprocessor  based  on  the  AMD2901  family  of  chips.  The 
operation  of  this  micromachine  is  described  in  Quarterly 
Technical  Reports  7  and  10  ( BBN  reports  4564  and  4816).  About 
one  third  of  the  I/O  board  is  devoted  to  the  Micromachine  and 
associated  hardware.  The  vertical  edge  connector  near  the  rear 
of  the  board  allows  the  attachment  of  a  writable  control  store 
( WCS)  which  is  used  for  microcode  development  and  debugging.  The 
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I/O  Board 
Figure  5 
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Block  Diagram  of  Butterfly  I/O  Board 
Figure  6 
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four-chip  set  that  makes  up  the  Central  Processing  Unit  of  the 
micromachine  is  located  to  one  side  of  the  WCS  connector.  The 
PROM,  RAM,  and  miscellaneous  support  circuitry  that  make  up  the 
rest  of  the  machine  are  spread  out  in  front  of  the  connector. 

The  second  Major  section  of  the  board  is  the  hardware  that 
implements  the  individual  I/O  channels.  Associated  with  each 
asynchronous  channel  is  a  Signetics  2661  UART  and  a  set  of  EIA 
RS-232  drivers  and  receivers.  In  addition  to  transmit  and 
receive  data,  the  asynchronous  channels  support  various  modem 
control  signals.  All  of  these  lines  come  off  the  board  through  a 
34-pin  header  which  mates  to  a  mass  terminated  ribbon  cable. 
Associated  with  each  synchronous  channel  is  a  Signetics  2652 
serial  communications  controller  and  a  set  of  EIA  RS-422  drivers 
and  receivers.  The  2652  implements  much  of  the  necessary  framing 
for  the  bit  level  protocol  used  by  the  Butterfly.  Each 
synchronous  channel  supports  transmit  and  receive  data  signals, 
plus  transmit  and  receive  clocks.  All  of  these  lines  come  off 
the  board  through  a  40-pin  header  which  mates  to  a  mass 
terminated  ribbon  cable. 

Also  on  the  Butterfly  I/O  board  is  a  set  of  dip  switches, 
these  are  used  to  establish  the  address  of- the  board  on  the 
Butterfly  I/O  link  (BlOlink).  The  legal  combinations  for  these 
switches  are  shown  in  Figure  7.  The  designations  "9N"  and  "ION" 
are  the  ones  that  are  silkscreened  onto  the  board.  Switch  ION  is 
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the  one  that  is  closest  to  the  BlOlink  connector.  The  numbering 
of  the  individual  switches  corresponds  to  the  markings  on  the 
switches  themselves.  Note  that  only  thirteen  of  the  sixteen 
switches  are  used.  Note  also  that  there  are  only  four  legal 
combinations  -  one  for  each  possible  postion  on  the  BlOlink. 
Using  more  than  the  minimum  number  of  switches  avoids  the  need 
for  extra  decoding  logic  that  would  have  impacted  the  bandwidth 
of  the  BlOlink. 
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Butterfly  I/O  Board  Dip  Switches 
Figure  7 


The  remainder  of  the  logic  on  the  board  supports  the 
Butterfly  I/O  link  BlOlink.  The  BlOlink  can  be  used  to  connect 
up  to  four  I/O  boards  to  a  Butterfly  processor  node.  The 
required  signals  come  off  the  I/O  board  through  a  50  pin 
connector  header  which  mates  to  a  mass  terminated  ribbon  cable. 
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Like  the  Processor  and  Switch  Nodes,  the  I/O  board  contains 
an  on-board  switching  power  supply.  There  is  no  mechanical 
switch  to  control  this  supply.  Instead,  it  is  controlled  by  a 
the  Processor  node  to  which  it  is  attached.  There  is  also  no 
power  indicator  on  the  I/O  board.  If  the  I/O  board  power  supply 
is  not  supplying  the  correct  voltages,  the  power  indicator  light 
on  the  Processor  node  to  which  it  is  attached  will  not  come  on. 
The  power  connector  for  the  I/O  board  is  located  on  the  same  edge 
of  the  board  as  the  I/O  connectors.  This  allows  all  external 
connections  to  be  made  from  the  front  of  the  rack. 
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